Protocols for Fault-Tolerant Distributed-Shared-Memory on the SOME-Bus Multiprocessor Architecture
نویسندگان
چکیده
The Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus) is a low-latency, high-bandwidth interconnection network that directly links arbitrary pairs of processor nodes without contention, and can efficiently interconnect over one hundred nodes. Each node has a dedicated output channel and an array of receivers, with one receiver dedicated to every other node’s output channel. The SOME-Bus eliminates the need for global arbitration and provides bandwidth that scales directly with the number of nodes in the system. Under the Distributed Shared Memory (DSM) paradigm, the SOME-bus allows strong integration of the transmitter, receiver and cache controller hardware to produce a highly integrated system-wide cache coherence mechanism. Backward Error Recovery faulttolerance techniques can rely on DSM data replication and SOME-Bus broadcasts with little additional network traffic and corresponding performance degradation. This paper presents three protocols for fault-tolerant DSM and uses simulation to examine the performance of the protocols on the SOME-Bus multiprocessor architecture. 1: The SOME-Bus Architecture Due to advances in fiber optics and VLSI technology it is possible to design an architecture that relies on broadcasts to support hardware-based DSM, allowing the implementation of coherence protocols at the cache block level through interactions of cache, memory and network interface controllers. Faulttolerance protocols on this system rely on data replication and broadcasts to implement Backward Error Recovery resulting in little additional cost in terms of network traffic. The Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus) [1] is such a network. One of its key features is that each node has a dedicated broadcast channel, realized by a specific group of wavelengths in a specific fiber, and an input channel interface based on an array of receivers, shown in Figure 1, which simultaneously monitors all channels. This design results in an effectively fully-connected network. Although the SOME-bus can utilize software techniques for implementing cache coherence, it allows strong integration of the transmitter, receiver and cache controller hardware to produce a highly integrated system-wide cache coherence mechanism. Figure 1: SOME-bus Parallel Receiver Array and Output Coupler Distributed shared memory (DSM) systems can be easier to program than systems that use a message-passing model. In the DSM paradigm, the SOME-Bus can most readily support a CC-NUMA system where the shared virtual address space is distributed across local memories that can be accessed both by the local processor and by processors from remote nodes, with different access latencies. Although the SOME-Bus can utilize software techniques for implementing cache coherence, it allows strong integration of the transmitter, receiver and cache controller hardware to produce a highly integrated system-wide cache coherence mechanism. Snooping is one common technique to maintain coherence. It requires that all caches see every write memory request from every processor, and has limited the scalability of DSM systems in the past because the interconnection network quickly saturates even with a few processors. The SOME-Bus does not encounter the same problem. Every processor may simply broadcast, on its own channel, messages that cause updates or invalidations at remote caches. Every receiver can also monitor its input channels for invalidation messages and signal the cache controller to take appropriate action when locally cached data is affected. Although the possibility of interconnection network saturation is eliminated, intense cache consistency traffic can saturate the cache controller. A SOME-Bus-based system can take advantage of directory-based techniques that notify only those remote caches with affected data blocks. This IEEE Workshop On Fault-Tolerant Parallel and Distributed Systems, Ft. Lauderdale, Florida, April 2002
منابع مشابه
Fault-Tolerant Distributed-Shared-Memory on a Broadcast-Based Interconnection Network
The Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus) is a low-latency, high-bandwidth interconnection network which directly links arbitrary pairs of processor nodes without contention, and can efficiently interconnect over one hundred nodes. Each node has a dedicated output channel and an array of receivers, with one receiver dedicated to every other node’s output channel. The SOME-...
متن کاملA comparison of Broadcast-based and Switch-based Networks of Workstations
Networks of Workstations have been mostly designed using switch-based architectures and programming based on message passing. This paper describes a network of workstations based on the Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus) which is a low-latency, high-bandwidth interconnection network that directly links arbitrary pairs of processor nodes without contention, and can effic...
متن کاملGlobal Bus Design of a Bus-Based COMA Multiprocessor DICE
DICE is a shared-bus multiprocessor based on a distributed shared-memory architecture, known as Cache-Only Memory Architecture (COMA). Unlike previous COMA proposals for large-scale multiprocessing, DICE utilizes the COMA to effectively decrease the gap between modern high-performance microprocessors and the bus. As microprocessors become faster and demand more bandwidth, the already limited sc...
متن کاملMerging, sorting and matrix operations on the SOME-Bus multiprocessor architecture
Due to advances in fiber-optics and VLSI technology, interconnection networks which allow multiple simultaneous broadcasts are becoming feasible. This paper presents the multiprocessor architecture of the Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus), and examines the performance of representative algorithms for matrix operations, merging and sorting, using the message-passing and...
متن کاملTransient Processor/Bus Fault Tolerance for Embedded Systems
We propose an approach to build fault-tolerant distributed real-time embedded systems. From a given system description (application algorithm and architecture) and a given fault hypothesis (type and number of faults to be tolerated), we generate automatically a static fault-tolerant multiprocessor schedule of the algorithm components on the target architecture, which minimizes the schedule leng...
متن کامل